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Abstract 

Background: Taxa that harbor natural phenotypic variation are ideal for ecological genomic approaches aimed 
at understanding how the interplay between genetic and environmental factors can lead to the evolution of 
complex traits. Lasioglossum albipes is a polymorphic halictid bee that expresses variation in social behavior 
among populations, and common-garden experiments have suggested that this variation is likely to have a 
genetic component. 

Results: We present the L albipes genome assembly to characterize the genetic and ecological factors associated with 
the evolution of social behavior. The de novo assembly is comparable to other published social insect genomes, with 
an N50 scaffold length of 602 kb. Gene families unique to L albipes are associated with integrin-mediated signaling 
and DNA-binding domains, and several appear to be expanded in this species, including the glutathione-s-transferases 
and the inositol monophosphatases. L albipes has an intact DNA methylation system, and in silico analyses suggest that 
methylation occurs primarily in exons. Comparisons to other insect genomes indicate that genes associated with 
metabolism and nucleotide binding undergo accelerated evolution in the halictid lineage. Whole-genome 
resequencing data from one solitary and one social L albipes female identify six genes that appear to be rapidly 
diverging between social forms, including a putative odorant receptor and a cuticular protein. 

Conclusions: L albipes represents a novel genetic model system for understanding the evolution of social behavior. It 
represents the first published genome sequence of a primitively social insect, thereby facilitating comparative genomic 
studies across the Hymenoptera as a whole. 



Background 

Social behavior holds special distinction in evolutionary 
biology because it represents a major transition from an 
individual to a coordinated group [1]. Despite this add- 
itional layer of complexity, the same genetic and genomic 
methods used to study complex behaviors in model sys- 
tems can be applied to the study of sociality. 

One such approach is to combine genetic and ecological 
studies to understand how genes and the environment 
shape the striking diversity of behaviors that occur within 
and between species. Taxa harboring natural variation in 
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a trait of interest - whether it be morphological, physio- 
logical, or behavioral - are ideal because they enable quan- 
titative and population genetic studies to elucidate some 
of the underlying genetic components [2,3]. Over the past 
few years, this approach has helped to illuminate some of 
the genetic and ecological factors associated with repeated 
evolution of both morphological and behavioral traits: ex- 
amples include benthic and limnetic forms in sticklebacks 
[4,5], coat color in mice [6,7], mimicry rings in Heliconius 
butterflies [8], and song performance in crickets [9,10]. 

Halictid bees (Hymenoptera: Halictidae) or sweat bees' 
are small- to medium-sized bees with a cosmopolitan dis- 
tribution and over 4,000 described species. They are mass 
provisioners and pollen feeders, and nest primarily in the 
ground. Most species are solitary, but many are primitively 
eusocial (as defined by [11]). These species produce col- 
onies composed of a facultatively sterile worker caste and 
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only one or a few reproductive individuals that are not 
morphologically distinguishable from each other [11,12]. 

Halictids are particularly useful models for behavior be- 
cause they harbor extensive variation in social behavior 
both within and between species [13,14]. Within the 
Halictidae, eusociality has evolved at least twice, with 
many subsequent reversions [15,16]. The origins of eu- 
sociality in this group are relatively young (approximately 
22 to 35 millions of years ago; [15,16]), and perhaps be- 
cause of this, a great deal of variation in social behavior 
exists, ranging from solitary to communal to eusocial 
(reviewed in [13,14]). Interestingly, some of these species 
are socially polymorphic, and females are capable of pro- 
ducing either solitary or social nests. These social poly- 
morphisms can occur across geographical gradients or 
even within populations [17-20]. 

One species in particular represents an ideal system for 
exploring the genetic mechanisms underlying social be- 
havior: Lasioglossum albipes. This socially polymorphic 
species is solitary in inland localities in France and 
Germany, but eusocial in southwestern France where 
the climate is warmer and nests are initiated earlier in 
the summer [11,18]. The life cycle of the social females 
is typical for a eusocial species with a univoltine life 
history (that is, one generation per year). Females found 
a nest in the spring and rear a first brood of workers 
that then help rear a second brood of reproductive 
males and females that subsequently mate and diapause 
through the winter to repeat the cycle the following 
spring. The life cycle of the solitary populations is the 
same except that the first, eusocial worker brood is 
not produced. Common-garden experiments were con- 
ducted with L. albipes in which both social forms were 
reared in the laboratory under the same conditions and 
also under complementary photoperiods; the typical be- 
haviors for each population remained the same in the 
lab as in the field, suggesting that this behavioral poly- 
morphism is likely to have an underlying genetic com- 
ponent [21]. This system thus provides an excellent 
model for studying the ecological and genetic factors 
associated with the evolution of social behavior. 

Here we present the draft genome of this socially poly- 
morphic bee, the first halictid for which genomic re- 
sources have been developed. We compare its genome 
sequence to other published insect genomes and identify 
a number of interesting patterns that can be tested in fu- 
ture studies. 

Results and discussion 

Genome sequencing and assembly 

DNA was isolated and assembled from two haploid 
males collected from a solitary population in Leysin, 
Switzerland (Additional file 1). Seven paired-end librar- 
ies with insert sizes ranging from 170 bp to 10 kb were 



constructed and sequenced on Illumina HiSeq 2000 and 
GAIIx (10 kb libraries) systems. To improve scaffolds, 
an additional 10 kb Illumina mate -pair library was con- 
structed from a pool of 20 females collected from mul- 
tiple French and Swiss populations (Additional file 1). 
Before filtering, this produced 53.62 Gb of raw data. 
Low quality reads, reads with a high proportion of Ns or 
poly-A structures, overlapping paired ends, and PCR du- 
plicates were filtered prior to assembly. Post-filtering, 
39.81 Gb of raw reads remained (Table 1). 

The L. albipes genome size is estimated at 416 Mb. The 
final assembly has an N50 scaffold length of 602 kb and a 
total length of 350.8 Mb (Table 2). The genome contains a 
high degree of repetitive elements, which comprise 32.71% 
of the final assembly (Additional file 2). Completeness of 
the assembly was assessed using the CEGMA pipeline 
[22]. Of the 248 core eukaryotic genes (CEGs), 243 were 
completely assembled in the L. albipes genome. The clos- 
est relative to L. albipes with a genome sequence is the 
honey bee, A. mellifera, and our results are comparable to 
that of the A. mellifera v4.5 genome assembly and the 
other sequenced hymenopterans (Additional file 3) ([23]). 

RNA sequencing 

RNA sequencing was performed on four pooled adult fe- 
males collected from field sites in France and Switzerland 
(Additional file 1). RNA was extracted and a 2 x 100 
paired-end library was sequenced on an Illumina HiSeq 
2000. The resulting 35,207,669 reads were mapped back 
to the reference genome (approximately 230x coverage 
assuming a 30 Mb transcriptome) using Tophat [24], 
and 23,308 transcript models were generated using 
Cufflinks [25]. These data were incorporated into the 
gene annotation pipeline (see below for details) to re- 
fine gene annotation. 

Gene annotation 

A combination of RNA-sequencing, de novo, and homology- 
based gene predictions generated an official gene set in- 
cluding 13,448 predicted genes (Additional files 4 and 5). 
Orthology was assigned using reciprocal best BLASTs 
(Additional file 6). Non-coding RNAs (ncRNAs) are sum- 
marized in Additional file 7. Treefam was used to cluster 
genes into 9,614 gene families using information from 
six additional hymenopterans (the honey bee, Apis melli- 
fera, four ants, Acromyrmex echinatior, Solenopsis invicta, 
Camponotus floridanus, Harpegnathos saltator, and the 
parasitoid wasp, Nasonia vitripennis), plus one additional 
insect as an outgroup to the Hymenoptera (the dipteran 
fruit fly Drosophila melanogaster). Multiple alignments 
of protein sequences were generated for each gene 
family across these eight insect species, and the four-fold 
degenerate sites were used to reconstruct the phylogeny 
(Figure 1). 
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Table 1 Data used for genome assembly and scaffolding 


Insert size (bp) 


Read length (bp) 


Raw data (Gb) 


Coverage (X) 


Data after filtering (Gb) 


Coverage (x) 


GC content (%) 


200 


100 


8.28 


19.90 


7.11 


17.09 


40.25 


500 


100 


14.36 


34.51 


9.64 


23.17 


39.85 


800 


100 


8.06 


19.36 


5.74 


13.80 


42.12 


2 kb 


49 


5.65 


13.59 


4.70 


11.30 


45.35 


5 kb 


49 


6.77 


16.30 


5.61 


13.49 


45.90 


10 kb 


49 


10.50 


25.14 


7.01 


16.80 


43.68 


Total 




53.62 


128.80 


39.81 


95.65 


42.86 



DNA was sequenced on an lllumina HiSeq 2000 (100 bp read lengths) or on the lllumina GAIIx (49 bp reads). Libraries were constructed across a range of insert 
sizes, from 200 bp to 10 kb. The final assembly after filtering consisted of 39.81 Gb of data with 95x coverage of the genome. 



There were 5,068 gene families shared among all four 
hymenopteran species, and 1,981 predicted genes that ap- 
pear to be unique to the L. albipes lineage (Figure 2). 
Functional enrichment analyses were conducted using 
chi-square and Fisher Exact tests (for small sample sizes) 
to calculate significance, and an FDR correction was ap- 
plied to account for multiple testing [27]. The gene ontol- 
ogy (GO) and InterPro protein domain (IPR) enrichment 
results for L. albipes-specific genes are listed in Additional 
files 8 and 9. Among these 1,981 unique genes, many are 
associated with the integrin-mediated signaling pathway 
(P <0.001) and have an over-representation of protein do- 
mains associated with nucleases (P <0.02), MADF/BESS 
domains (P <0.0001), and ankyrin and PRANC domains 
(P <0.0001). 

Gene family expansion 

Two notable gene families appear to be expanded in the 
L. albipes lineage: glutathione-S -transferases (GSTs) and 
the inositol monophosphatases (IMPs) (Figure 3). The 
GST gene family is associated with the metabolism of sec- 
ondary compounds and insecticides in insects. Specifically, 
these enzymes catalyze a reaction between glutathione 
and these compounds, making them more soluble and 

Table 2 Genome assembly statistics 

32,498 
14,618 
4,377 
616,426 
152 

3,533,895 
13,448 
97.98/100 



Contigs (n) 
Largest contig (bp) 
Scaffolds >1 kb (n) 
N50 scaffolds (bp) 
Scaffolds >N50 (n) 
Largest scaffold (bp) 
Predicted genes 

Ultra-conserved core eukaryotic genes 
(complete/partial, %) 



Summary statistics for final L albipes genome assembly. 152 scaffolds are 
greater than the N50 of 616 kb, with the largest assembled scaffold containing 
3.5 Mb. The genome assembly appears to be nearly complete, with 98% of all 
core eukaryotic genes completely assembled (complete) and 100% at least 
partially assembled (partial). The official gene set contains 13,448 
predicted genes. 



easier to excrete. This gene family also plays an important 
role in intracellular transport, hormone biosynthesis, and 
protection against oxidative stress [28]. The L. albipes 
genome contains nine members of this gene family, in 
contrast to four genes in A. mellifera. Only two of the four 
A. mellifera orthologs appear to be duplicated in L. albipes 
(Figure 3A). The inositol monophosphatase gene family is 
a group of dephosphorylating enzymes used to free myo- 
inositol in eukaryotic taxa [29] and is associated with lipid 
metabolism. L. albipes has seven genes in this family, 
while A. mellifera has only three (Figure 3B). The expan- 
sions of the IMP gene family in L. albipes may reflect the 
life history of this species where, unlike A. mellifera, foun- 
dresses must undergo diapause as adults prior to founding 
a new nest in the spring and as a result, efficient nutrient 
storage and lipid metabolism may be particularly crucial 
to survival and reproduction. 

We also characterized over- and under-represented 
IPR domains in the L. albipes gene set in comparison 
to A. mellifera. IPR domains with > 2-fold difference be- 
tween L. albipes and A. mellifera were considered as 
over- or under-represented. There were 92 IPR domains 
overrepresented in L. albipes (Additional file 10), includ- 
ing some associated with the expanded gene families 
discussed above, such as IPR017933 and IPR000760. 
Additionally, the MADF domain (IPR006578) has 42 
copies in L. albipes but only nine in A. mellifera. This 
domain is associated with transcription factor Adf-1 in 
Drosophila, and is known to play a role in the regulation 
of alcohol dehydrogenase expression [30]. There are also 
several fatty acid-related domains over-represented in 
L. albipes (IPR015876, IPR005804 and IPR020842). Pre- 
vious studies in H. saltator found expression of a fatty 
acid synthase to be upregulated in reproductive females 
relative to workers [31]. 

Gene family contraction 

Several genes appear to be lost in L. albipes but present 
in the six other sequenced hymenopterans (Additional 
file 11), and 36 IPR domains are under-represented in 
the L. albipes gene set when compared with A. mellifera 
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Figure 1 Phylogenetic placement of L albipes. Four-fold degenerate sites were used to reconstruct the phylogeny of eight sequenced insect 
genomes. Numbers at the nodes represent divergence times estimated with the 'mcmc' package in PAML [26]. L albipes and A mellifero diverged 
approximately 70 million years ago. 



(Additional file 12). One under-represented domain 
of interest is IPR017996 ('Major Royal Jelly protein 
(MRJP)). Yellow and royal- jelly-like proteins control ex- 
pression of genes affecting cuticular pigmentation, devel- 
opment, sexual maturation and behavior [32], and are 
associated with caste determination in the honey bee 
[33]. We manually curated the yellow and MRJP gene 
families, and found 10 yellow genes in L. albipes, the 
same number as in A. mellifera. In contrast, only two 
credible MRJP genes were found in L. albipes, similar to 
the number found in the ant species C. floridanus and 



H. saltator. The ML tree of yellow genes and MRJP 
genes is shown in Additional file 13. 

DNA methyltransferases 

Epigenetic mechanisms can play an important role in gene 
regulation and phenotypic plasticity. DNA methylation 
appears to be one of the key mechanisms underlying 
transgenerational epigenetic effects. DNA methylation is 
widespread across Hymenoptera [34] and has been impli- 
cated in caste differentiation in honey bees and ants 
[31,35,36]. 



C. floridanus 



A. mellifera 




N. vitripennis 



L.albipes 



Number of gene families 

Figure 2 Overlap among gene families for four 4 hymenopteran species. Numbers indicate the gene families in each comparison. A total of 
5,068 gene families are shared among all four species. 
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Figure 3 Gene family expansions in L albipes. Two gene families appear to have expanded in the L olbipes lineage. Trees were calculated 
using maximum-likelihood. (A) Glutathione-S-transferase gene family. Nine homologs of this family were identified in L albipes (blue branches) in 
contrast to only four known homologs in A mellifera (orange branches). (B) Inositol monophosphatase gene family. Seven members of this family 
are found in the L olbipes genome (blue branches), in contrast to three homologs in A mellifera (orange branches). 



DNA methyltransferases (DNMTs) are the genes that 
perform DNA methylation; all share a conserved catalytic 
domain, suggesting a common and ancient origin [37]. 
Studies of mammalian systems have established that 
different DNMTs undertake distinct functions (reviewed 
in [38]). For example, human genomes contain two 
DNMTls, one DNMT2, and one DNMT3 (DNMT3a/b). 
DNMT1 is responsible for maintaining the patterns of 
DNA methylation between DNA replications and is re- 
ferred to as the 'maintenance methyltransferase'. The role 
of DNMT2 is still not completely resolved, but recent 
studies suggest that it may act as a tRNA methyltransfer- 
ase. Finally, DNMT3s mediate de novo methylation of pre- 
viously unmethylated cytosines. 

We investigated whether L. albipes exons contain a 
complete repertoire of putative DNMTs. Using a homology- 
based search (Additional files 14 and 15), we found strong 
evidence that the L. albipes genome encodes two puta- 
tive xDNMTls (Lalb_01810 and Lalb_06290), one puta- 
tive DNMT2 (Lalb_08279), and one putative DNMT3 
(Lalb_11571). These results demonstrate that L. albipes 
appears to have an intact DNA methylation system. 

CpG content in and around genes 

DNA methylation patterns show a high degree of conser- 
vation across insect taxa. Methylation appears to occur pri- 
marily in exons and in 5 ' UTRs (untranslated regions) and 
has been implicated in alternative splicing [39-42]. In 
animal genomes, DNA methylation occurs primarily at 
CpG dinucleotides (cytosine followed by guanine). Because 



methylated cytosines undergo frequent deamination and 
tend to mutate to thymines, methylated CpG regions tend 
to have lower frequencies of CpG dinucleotides [43]. To 
investigate the presence of DNA methylation and its influ- 
ence, we examined normalized CpG content (CpG O/E) 
across different genomic regions in L. albipes. Regions with 
lower CpG content than expected are interpreted as a sig- 
nal of methylation [44]. 

Genomic fragments in L. albipes have a CpG O/E of 
1.58, indicating that there is an overabundance of CpG di- 
nucleotides in this species. These results are similar to the 
honey bee, which has a CpG O/E of 1.67 [45]. Despite 
this genome-wide overabundance of CpG dinucleotides, 
L. albipes coding sequences (CDS) and gene bodies exhibit 
lower CpG O/E values than the genomic background 
(P <10" 160 for both), suggesting that DNA methylation 
may impact CpG content in CDS (Figure 4). Furthermore, 
CDSs exhibit significantly lower CpG content than do 
gene bodies, suggesting that that DNA methylation occurs 
primarily in exons (Figure 4A). 

Somewhat surprisingly, GpC O/E values also varied 
significantly between CDS and the genomic background 
(P <10~ 160 ), though these differences were less pronounced 
than they are for CpG O/E (Figure 4B). This was caused 
by G+C content skew in CDS, which are particularly GC- 
enriched in L. albipes (Additional file 16). Interestingly, 
there is a strong negative correlation between GpC O/E 
and GC content as well as between CpG O/E and GC 
content (Additional file 17). After controlling for under- 
lying GC content variation, only CpG O/E ratios exhibit 
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1.0 2.0 3.0 0.0 1.0 2.0 

CpG O/E GpC O/E 

Figure 4 CpG Content in L albipes. (A) Exons and mRNAs have lower CpG than expected, suggesting that gene bodies may be preferentially 
methylated in L albipes, consistent with previous findings in A mellifera. (B) GpC ratios were also calculated as a control. GpC dinucleotides have 
the same sequence composition as CpGs, but are not subject to biased mutation rates due to DNA methylation. 



clear and substantial differences from the genomic back- 
ground (Additional file 18), providing strong support for 
DNA methylation in L. albipes exons. 

Based on these results, we propose that DNA methyla- 
tion occurs in the L. albipes genome, although experimen- 
tal validation will be necessary. A subset of 1,801 genes 
harbor extreme differences in CpG and GpC O/E values 
and are likely to reflect methylation in these regions 
(Additional file 19). The negative correlation between 
normalized CpG and GpC content versus GC content 
is unique in the L. albipes genome and appears to be 
opposite to the pattern observed in the honey bee [46], 
suggesting that additional, unknown evolutionary forces 
are acting on the nucleotide composition of L. albipes. 

Molecular evolution 

L. albipes is the first bee whose genome has been char- 
acterized that does not belong to the corbiculate bees 
(Hymenoptera: Apidae) and, as such, represents a novel 
lineage for comparative studies aimed at identifying the 
molecular toolkit associated with the evolution of social 
behavior. To identify genes showing signatures of accel- 
erated evolution in L. albipes (halictid bees) and/or 
Apoidea (all bees), we conducted branch-specific tests in 
PAML [26]. We chose six species representative of the 
sequenced Hymenopteran genomes to perform these 
analyses. These taxa include two bees (L. albipes and A 
mellifera), two ants (H. saltator and S. invicta), the para- 
sitoid wasp, N. vitripennis, as an outgroup to the social 
Hymenoptera, and the fruit fly, D. melanogaster, as an 
outgroup to the Hymenoptera as a whole. We used the 
'branch' model in PAML to search for signatures of 



accelerated evolution in focal lineages using a likelihood- 
ratio test (LRT) to calculate significance. Following cor- 
rection for multiple testing, 615 genes showed signatures 
of accelerated evolution in the L. albipes lineage when com- 
pared to the remaining branches (FDR<0.05; Additional 
file 20), and 899 in Apoidea when contrasted to the remain- 
ing branches (FDR <0.05; Additional file 21). 

Functional enrichment analyses for these genes are sum- 
marized in Additional files 22, 23, and 24. In general, 
genes associated with carboxylic acid metabolism, cell sig- 
naling, and protein transport appear to be subject to accel- 
erated evolution in L. albipes relative to the other species 
examined (Figure 5). Heat shock proteins (Additional file 
23) are also evolving more quickly in the halictid lineage. 
This gene family is known to play a key role in diapause in 
a number of insect species [47] and is potentially interest- 
ing given the population-level correlation between social- 
ity and microclimate in L. albipes [11,18]. Furthermore, 
many halictid species exhibit strong behavioral plasticity 
in response to local environment [17-20]. It is possible 
that the accelerated evolution of heat shock proteins may 
reflect this groups ability to determine the behaviorally 
appropriate response to environmental conditions. 

Differences between social forms 

To look for gross genetic differences between social forms, 
individual females from one solitary and one social popu- 
lation were sequenced to approximately 15x coverage. Re- 
mapping of these reads to the reference genome revealed 
that these individuals vary as much from each other as 
they do from the reference sequence, with 499,486 SNPs 
unique to the solitary female (compared to the social 
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female and the reference sequence) and 493,579 SNPs 
unique to the social female (compared to the reference 
and solitary female). Per-site Wattersons 0 W was calcu- 
lated using four-fold degenerate nucleotides, and is esti- 
mated at 0.003. Comparisons with A mellifera (0 W 
approximately 0.09 for non-coding sites) ([48]) suggest po- 
tentially reduced level of genetic diversity of L. albipes 
compared to the honey bee. 

The numbers of synonymous (Ks) and non-synonymous 
(Ka) substitutions and their ratio (Ka/Ks) were calculated 
for each gene to search for loci showing signatures of ac- 
celerated evolution between social forms. Sliding window 
analyses did not reveal any large genomic regions that ap- 
pear to be differentiated between social forms, but six 
genes had Ka/Ks values >1, indicating that these sequences 
could be diverging rapidly between these two populations 
(FDR <0.1; Additional file 25). Two of these genes encode 



a putative odorant receptor (Lalb_14702) and a cuticular 
protein similar to apidermin-3 (Lalb_0725) ([49]), perhaps 
indicating that differences in chemical signaling and/or 
pheromone production are associated with shifts in social- 
ity. The remaining genes include: a metalloendopeptidase 
similar to neprilysin 1, which is associated with modula- 
tion of neurotransmitter levels and expressed in the brain 
mushroom bodies in Drosophila [50], and a receptor- type 
tyrosine-protein phosphatase associated with the regula- 
tion of axon guidance and also with autism in humans 
[51]. These genes provide an interesting set of candidates 
for further examination, but given that these genetic dif- 
ferences were characterized between two individuals, fur- 
ther work examining multiple individuals from a number 
of solitary and social populations is needed to fully 
characterize signatures of selection between the solitary 
and social behavioral forms within L. albipes. 
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Conclusions 

Its intraspecific behavioral variation makes L. albipes ideal 
for population and ecological genomic studies to char- 
acterize the underlying genetic components associated with 
solitary and social behavior. Our results suggest that mech- 
anisms associated with DNA methylation and nutrient stor- 
age may play a role in modulating social behavior in this 
species, and future research will examine these pathways in 
more detail. The addition of L. albipes to the published hy- 
menopteran genomes establishes a framework for further 
phylogenetic comparisons that we can use to investigate 
forces that have shaped the evolution of social behavior. 

Materials and methods 

Sample preparation and library construction 
Genomic DNA 

Whole bodies were first rinsed in ethanol then ground 
in liquid nitrogen to facilitate extraction of genetic ma- 
terial. DNA extractions were performed using a Qiagen 
Genomic-tip 20/G kit (Valencia, CA, USA) and standard 
protocol. Genomic DNA extracted from the samples Albi-2 
and Albi-3 was used to generate non-amplified DNA librar- 
ies of 200 and 500 bp. To obtain sufficient genomic DNA, 
we performed a multiple displacement amplification on 
the Albi-2 sample using the REPLI-g Midi kit (Qiagen, 
Valencia, CA, USA) prior to library construction. This may 
have contributed to lower GC content within these librar- 
ies, and as such, additional 200, 500, and 800 bp libraries 
were constructed from unamplified genomic DNA from 
the Albi-3 sample. To improve genome assembly, genomic 
DNA from 20 pooled females was also used to construct an 
amplification-free 10 kb library. 

For library construction, DNA was sheared to fragments 
of size 200 to 500 bp. Ends were repaired, A-tailed, and li- 
gated to paired-end adapters (Illumina, San Diego, CA, 
USA). A size selection was then performed by agarose gel, 
and fragments were amplified with LM-PCR. Long-insert 
libraries were constructed by shearing genomic DNA to 
the appropriate insert size with nebulization (2 kb library) 
or HydroShear (5 kb and 10 kb libraries; Covaris, Woburn, 
MA, USA). Fragments were end-repaired with biotinyl- 
ated nucleotide analogues (Illumina, San Diego, CA, USA), 
and size-selected fragments (2 kb, 5 kb, and 10 kb) were 
circularized via intramolecular ligation, sheared to 500 bp 
with Adaptive Focused Acoustic (Covaris, Woburn, 
MA, USA), and purified on magnetic beads (Invitrogen, 
Carlsbad, CA, USA). These purified fragments were 
then end-repaired, A-tailed, and ligated to paired end 
adapters (Illumina). A final size selection step and LM-PCR 
purification was conducted prior to sequencing. 

Total RNA 

For transcriptome sequencing, RNA was extracted from 
four individual females using a Qiagen RNeasy extraction 



kit (Valencia, CA, USA) and standard protocol. RNA was 
then pooled and cDNA libraries constructed. 

First strand cDNA was synthesized using random hex- 
amers and Superscript II reverse transcriptase (Invitro- 
gen, Carlsbad, CA, USA). R coli DNA Poll (Invitrogen) 
was used for second strand synthesis, and double 
stranded cDNA was then purified using the Qiaquick 
PCR purification kit (Qiagen, Valencia, CA, USA). Puri- 
fied cDNA was sheared to 100 to 500 bp fragments with 
a nebulizer (Invitrogen), end-repaired, and a 3' dA over- 
hang added to the ends. Illumina adapters were ligated 
to the cDNA and size selected to 200 ± 20 bp on an 
agarose gel. Fifteen cycles of PCR amplification were 
conducted prior to sequencing. Gel-purification of 18 to 
30 nt RNA was used for smRNA-seq. 5 ' and 3 ' Illumina 
RNA adapters were ligated to these fragments, and 
products were size-selected on a denaturing polyacryl- 
amide gel. These purified products were then reverse 
transcribed with small RNA RT primers and amplified 
with small RNA PCR primers 1 and 2 (Illumina) with 
15 cycles of PCR prior to sequencing. 

Genome assembly 

SOAPdenovo [52] was used for genome assembly. We 
constructed a de Bruijn graph using the parameter '-K 
47'. Then, default parameters were used to simplify the 
graph and generate contigs by removing tips, merging 
bubbles and solving repeats. All sequenced reads were 
then realigned onto the contig sequences with the pa- 
rameters: '-k 47 -f\ Finally, scaffolds were constructed 
by weighting the rates of consistent and conflicting 
paired-end relationships with parameter: -F -u\ All us- 
able reads were realigned to contigs and paired-end in- 
formation was used to assemble scaffolds and close gaps. 
Raw sequencing reads were mapped back to the scaf- 
folds using SOAPaligner [53] with options '-m 0 -x 1000 -v 
5' for 200 bp, 500 bp, and 800 bp libraries, '-m 0 -x 
10000 -R -v 3' for 2 kb and 5 kb libraries, and '-m 0 -x 
20000 -R -v 3' for the 10 kb library. The results were 
used to check for GC bias in the sequencing data. Po- 
tential contaminants were filtered using BLASTN to 
align all assembled sequences against the NCBI nt data- 
base (version: 20110312), and sequences with best hits 
to bacterial or fungal sequences were removed from the 
assembly and excluded in downstream analyses. Com- 
pleteness of the assembly was assessed using the CEGMA 
pipeline [22]. 

Repeat annotation 

Known transposable elements (TEs) were identified with 
RepeatMasker (version 3.2.6) ([54]) using the Repbase 
TE library (v. 15.02) ([55]) and default parameters. A 
consensus sequence for each repeat family was gener- 
ated and used as the library in RepeatMasker to identify 
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additional high and medium copy repeats (>10 copies) 
in the genome assembly. Tandem repeats were predicted 
using TRF [56], with parameters set to 'Match = 2, Mis- 
match = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50, 
and MaxPeriod = 12\ In total, 107.29 Mb of repetitive 
sequence were identified, comprising 32.71% of the as- 
sembled genome (Additional file 2). 

Protein-coding gene annotation 

Three different methods were used to predict protein- 
coding genes: (a) homology-based methods, (b) de novo 
prediction, and (c) RNA sequencing. Gene sets from 
four species (C. elegans, D. melanogaster, H. saltator, 
and A. mellifera) were used for homology-based predic- 
tions, one species at a time. TBLASTN was used to 
search the non-redundant protein sequences of each 
gene set with an E-value <le-5. The best hit was then se- 
lected, and regions with homologous blocks shorter than 
50% of the query protein were excluded. We then used 
GENE WISE (v. 20.0) ([57]) to generate the gene struc- 
tures. Homology-based repeats were masked in the gen- 
ome, and AUGUSTUS [58] and SNAP [59] were used 
for de novo gene prediction. Parameters were trained 
using 2,682 high quality genes with intact ORFs based 
on homology to A. mellifera. Evidence derived from 
homology-based predictions (4 sets) and de novo predic- 
tions (2 sets) were then integrated in GLEAN to gener- 
ate a consensus gene set. Based on these analyses, 
22,068 genes passed the GLEAN criteria. 

To improve gene annotation, RNA sequencing was 
performed on four pooled adult females collected from 
field sites in France and Switzerland (Additional file 1). 
Reads were mapped to the current genome assembly 
using Tophat [24], and then Cufflinks [25] was used to 
assemble the mapped reads into transcripts with the fol- 
lowing parameters for Tophat: '-r 20 -mate-std-dev 10 -I 
10000', and for Cufflinks: '-I 50000'. ORFs were predicted 
in assembled transcripts using BGIs in-house pipeline, 
CCG. CCG also integrated the gene models from GLEAN 
with the transcript-based models in Cufflinks to generate 
an improved gene set. 

Finally, manual curation and visual screening was per- 
formed to refine the final gene set. The transcript-based 
gene models with intact ORFs that had no overlap with 
the CCG gene set were added. If a transcript-based gene 
model with an intact ORF covered more than one 
homology-based gene, the homology-based gene would 
be replaced by the transcript-based gene model. Gene 
models supported by more than homology prediction 
but that had no overlapping genes in the gene set were 
added. Gene models predicted to be transposable 
element-related (based on IPRscan and Swiss-Prot anno- 
tation) were removed. Furthermore, genes of particular 
interest (for example, the expanded and contracted gene 



families) were manually checked. A final gene set of 
13,448 genes was used for downstream analysis. 

Functional annotation 

Protein function was assigned using BLASTP best hits 
to the Swiss-Prot database (E-value <le-5). Gene motifs 
and domains were determined using by InterProScan 
[60] against the InterPro database [61]. Gene Ontology 
(GO) annotations for each gene were obtained from the 
corresponding InterPro entry. The KEGG orthology [62] 
annotation was done by KAAS online server [63] using 
the SBH method. The pathways in which each gene 
might be involved were derived from the best KO hit. 
The statistics of functional annotation is provided in 
Additional file 11. All functional enrichment analyses 
were conducted using custom scripts. 

ncRNA annotation 

ncRNAs were predicted using INFERNAL [64] and 
tRNAscan-SE [65]. Four types of ncRNA were anno- 
tated: microRNA (miRNA), transfer RNA (tRNA), ribo- 
somal RNA (rRNA), and small nuclear RNA(snRNA). 
tRNA genes were predicted by tRNAscan-SE with 
eukaryote parameters. rRNA fragments were identified 
by aligning the rRNA template sequences from inverte- 
brate animal using BLASTN with an E-value <lE-5. 
miRNA and snRNA genes were predicted by INFERNAL 
using the Rfam database (release 9.1). To accelerate the 
speed, a rough filtering was performed before INFER- 
NAL, by Blastn against the Rfam sequence database with 
an E-value cutoff of 1. Additional file 12 summarizes the 
statistics of ncRNA annotation. 

Construction of gene families 

To gain insight into the evolution of L. albipes gene fam- 
ilies, we used Treefam [66] to cluster protein-coding genes 
from eight insect species (A. echinatior, S. invicta, C. 
floridanus, H. saltator, L. albipes, A. mellifera, N. vitripen- 
nis, and D. melanogaster) into gene families. Only the lon- 
gest transcript isoform was used for each gene. BLASTP 
(with E-value <le-5) was performed against a blast data- 
base including protein-coding sequences for all species. 
Graph based methods were used to join fragmental align- 
ments for each gene using the solar package in Treefam. 
We assigned a connection (edge) between two nodes 
(genes) if more than one-third of the region was aligned 
in both genes. A H-score ranging from 0 to 100 was used 
to weigh the similarity (edge). For two genes Gl and G2, 
the H-score was defined as score (GlG2)/max (score 
(G1G1), score (G2G2)), (score = BLAST raw score). We 
used the average distance for the hierarchical clustering al- 
gorithm, requiring the minimum edge weight (H-score) to 
be larger than 10, the minimum edge density (total num- 
ber of edges/theoretical number of edges) to be larger 
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than 0.34. After clustering, we generated multiple align- 
ments of protein sequences for each gene family using 
MUSCLE [67], and converted the protein alignments to 
CDS alignments. A Venn diagram including the Hymen- 
opteran species is shown as Additional file 13. 

Phylogeny construction 

Four-fold degenerate sites were used from the single-copy 
gene family alignments, and used to reconstruct the phyl- 
ogeny of these eight species in MrBayes [68] with default 
parameters. Divergence times of the nodes were inferred 
using the mcmctree package in PAML [26]. 

Gene family expansion and contraction 

We used CAFE [69] to identify gene family expansions 
and contractions in L. albipes. This revealed two gene 
family expansions: glutathione-S -transferases (Additional 
file 15), and inositol monophosphatase (Additional file 
16). Maximum-likelihood (ML) trees of the expanded 
families were constructed with PhyML [70]. Roots of 
the trees were determined using the 'root' function in 
TreeBest [71]. 

Genes specific to L albipes 

We performed functional enrichment analyses with cus- 
tom scripts using chi-squares and Fisher Exact tests (for 
small sample sizes) to calculate statistical significance. 
We then performed an FDR [27] correction to account 
for multiple testing. The GO/IPR/KEGG enrichment re- 
sults for L. albipes specific genes are listed in Additional 
file 7: Tables S7, Additional file 8: Table S8 and Additional 
file 9: Table S9. 

Gene loss 

A gene was considered to be lost if it was absent in 
L. albipes but present in the six other hymenopteran in- 
sects (A echinatior, S. invicta, C. floridanus, H. saltator, 
A mellifera and N. vitripennis). To ensure these genes 
were not due to incorrect clustering or uncompleted an- 
notation, we realigned these genes against the genome 
assembly. Genes that failed to pass the previous gene 
prediction criteria, but that had strong evidence of 
homology (Genewise score > = 70) and were supported 
by expression data (average coverage depth by RNA-seq 
data > 1) were reintegrated into the final gene set. Fol- 
lowing this, 30 families were found to be lost in 
L. albipes. Each of the 30 families has only one homolog 
in A. mellifera (Additional files 11 and 12). 

Ortholog identification 

We used the other 11 species, including nine Hymenop- 
tera species A. mellifera, N. vitripennis, H. saltator, 
C. floridanus, A. echinatior, S. invicta P. barbatus A. 
cephalotes and L. humile, as well as D. melanogaster and 



H. sapiens to identify ortholog groups with L. albipes 
with BLASTP (Additional file 24). For each gene set, we 
performed all- against- all blasts. Then, we filtered the re- 
sults by requiring the aligned rates of both target and re- 
quired that the query must be >50%. We used the 
reciprocal best hit (RBH) of Blast score to determine 
orthologs. 

Characterization of DNA methyltransferases 

We performed a BLASTP search against the human 
(Homo sapiens), honeybee (A. mellifera), chicken {G alius 
gallus), and Nasonia (N. vitripennis) dnmts using all 
L. albipes proteins as queries. Then potential L. albipes 
homologs and their query sequences were used to con- 
struct a phylogenetic tree using maximum-likelihood 
using the JTT model. 

CpG content 

We calculated the normalized CpG content in four types 
of sequences: exons, introns, UTR, and whole-genome 
genomic fragments of 1,000 bp. These values were esti- 
mated using the formula: 



CpGo/E 



"CpG 



P c *Pg 



where Pc p g> Pa an d Pg represent the frequencies of 
CpG dinucleotides, C nucleotides, and G nucleotides, re- 
spectively, estimated from each genomic fragment. Be- 
cause GpC dinucleotides have the same sequence 
composition as CpG dinucleotides, but are not subject 
to DNA methylation, this calculation represents a nega- 
tive control. See [44] for further methodological details. 

To account for the strong negative correlation between 
G+C content and both CpG and GpC O/E values, we di- 
vided genes and genomic fragments into five groups ac- 
cording to their G+C content, specifically, G+C < 0.35, 
0.35 < G+C < 0.45, 0.45 < G+C < 0.5, 0.5 < G+C < 0.55 and 
G+C > 0.55. This allowed us to compare CpG O/E and 
GpC O/E of different genomic regions while accounting 
for G+C content. 

Potentially methylated CDS are defined as those with 
significantly lower CpG O/E than the genomic back- 
ground while exhibiting not significantly different GpC 
O/E than the genomic background. We performed a per- 
mutation test to determine whether CpG O/E of a specific 
gene is significantly lower than genome background. 
P values were determined as the ratio of 1,000 bp ge- 
nomes fragments whose CpG O/E (GpC O/E) is lower 
than the focal fragment. The P values were then FDR- 
adjusted for multiple testing. With FDR <0.2, we observed 
1,814 CDs with significantly lower CpG O/E while only 27 
CDs with significant lower GpC O/E. Among the 1,814 
CDs, 13 of them exhibit both significantly lower CpG O/E 
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and GpC O/E. After discarding those 13 CDs, we consid- 
ered the remaining 1,801 CDs as potentially methylated. 

Molecular evolution 

We used PAML to search for genes showing signatures 
of accelerated evolution in (a) the L. albipes lineage or 
(b) the Apoidea lineage. We chose six species (L. albipes, 
A. mellifera, H. saltator, S.invicta, N. vitripennis, and 
D. melanogaster) to perform the accelerated evolution 
analyses. First, the phylogenetic tree was inferred from 
the four-fold degenerate sites of orthologous groups in 
the six species. To do LRTs with PAML, we ran one-rate 
branch models ('model = 0' in PAML control file) and 
two-rate branch models ('model = 2' in PAML control 
file). Two kinds of two -rate branch models were run: 
one for the L. albipes lineage, the other for the Apoidea 
lineage. Other parameters set in the PAML control file 
were codonfreq = 2, kappa = 2.5, initial omega = 0.2, and 
fix alpha = V. P values were FDR-adjusted with a cutoff 
of 0.05. Functional enrichment analyses were then con- 
ducted for the genes that were found under accelerated 
evolution (Additional files 14, 15, and 16). 

Individual resequencing data 

Individual females from a solitary and social population 
were sequenced to approximately 15 x coverage on an 
Illumina HiSeq (2 x 150, paired end) in order to look for 
large genetic differences between social forms. DNA 
from each female was extracted using the AutoGen 
DNA extraction kit (AutoGen, Holliston, MA, USA). 
Whole bodies were first rinsed in ethanol then ground 
in liquid nitrogen to facilitate extraction of genetic ma- 
terial. DNA was sheared to approximately 400 bp using 
HydroShear (Covaris, Woburn, MA, USA). Libraries 
were constructed using the PrepX ILM DNA Kit for the 
Apollo 324 system (IntegenX, Pleasanton, CA, USA), 
and sequenced with a Rapid Run (2 x 150 bp) on an 
Illumina HiSeq 2500. 

Reads were quality checked using FastQC [72] and 
were then mapped back to the reference genome using 
Stampy [73] using default parameters. Variants were 
called following the best practices in the Genome Ana- 
lysis Toolkit v2.7.2 (Broad Institute, Cambridge, MA, 
USA) and included a local realignment step and variant 
calling with Haplotype Caller. SNPs were filtered using 
the following parameters: QD <2.0, FS >60.0, MQ <40.0, 
HaplotypeScore >13.0, MappingQualityRankSum <-12.5, 
and ReadPosRankSum <-8.0. Nucleotide diversity was 
calculated using vcftools [74], and 0 W was estimated 
from four-fold degenerate nucleotides using custom 
scripts (available upon request). Ka/Ks calculations were 
performed using KaKs calculator using the YN model 
averaging method [75], and FDR-corrections were per- 
formed with the P adjust package in R (v2.12). Sliding 



windows were calculated in 100 kb increments across 
the genome to look for tracts with a high degree of dif- 
ferentiation. Alignments of significant genes were manu- 
ally checked; one gene (Lalb_07521) was excluded from 
the list due to uncertainty in read mapping (possibly 
from a paralogous gene). 

Data access 

This whole genome shotgun project has been deposited at 
DDBJ/EMBL/GenBankunder the accession SRP016091. 
The version described in this paper is the first version, 
SRP016091. The RNA sequencing reads have been depos- 
ited in the short read archive under the accession 
SRX190462, and the individual resequencing data have 
been deposited in the short read archive under the acces- 
sions SAMN02429130 and SAMN02429131. 

Additional files 



Additional file 1: Sample information. Sample collection data for 
specimens used in genome and transcriptome sequencing. Sample 
names, sex, collection dates, region, and GPS coordinates are specified, as 
well as the libraries each specimen was used to construct. 

Additional file 2: Repeats in the genome. Repeat annotation was 
conducting using RepeatMasker. The overlaps between repeats have 
been excluded before the calculation of the total size. The length and 
percent of the genome comprised by each repeat is included. 

Additional file 3: Genome assembly comparisons. Comparison of 
genome assemblies for sequenced hymenopteran species. L albipes is 
highly comparable to these other sequenced species. 

Additional file 4: Gene prediction statistics. Gene prediction relied on 
three strategies: de novo prediction, homology-based approaches using 
four well-annotated genomes, and RNA sequencing (CCG). Statistics 
indicate the number of genes annotated with each method, the average 
transcript and coding sequence (CDS) lengths, the average number of 
exons per gene, and the average exon and intron lengths. 

Additional file 5: Gene predictions in comparison to other 
sequenced insect genomes. Comparisons of coding sequence (CDS), 
mRNA, exon, and intron length were conducted across five arthropod 
genomes. Amel: Apis mellifera, Cele: Caenorhabditis elegans, Dmel: 
Drosophila melanogaster, Hsal: Harpegnathos saltator, Lalb: Lasioglossum 
albipes. 

Additional file 6: Orthology between L albipes and other species. 

The top row includes the number of genes annotated in the current 
L. albipes assembly, and subsequent rows represent the number of 
orthologs in L. albipes in comparison with each named species, all 
sequenced ants (H. saltator, C. floridanus, A. echinatior, 5. invicta, L. humile, 
P. barbatus, and A. cephalotes), and all sequenced Hymenoptera (all ants 
plus A mellifera and N. vitripennis). 

Additional file 7: Non-coding RNA genes in the genome. Annotated 
ncRNA summary statistics. The average length of miRNA is for the 
predicted precursor miRNA. The number of copies annotated in the 
genome, their average length in basepairs, summed total length, and the 
percentage of the genome comprised by each element are included. 

Additional file 8: GO enrichment in L. albipes specific genes. The 

P values were adjusted by FDR and the cutoff of adjusted P value is 0.05. 

Additional file 9: IPR enrichment in L albipes specific genes. The 

P values were adjusted by FDR and the cutoff of adjusted P value is 0.05. 

Additional file 10: IPR domains over-represented in the L albipes 
lineage. The domains that have at least 10 copies are included in this 
table. Additional columns report the number of domains characterized in 
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each species. Aech: A. echinatior, Amel: A. mellifera, Cflo: C. floridanus, 
Dmel: D. melanogaster, Hsal: H. saltator, Lalb: L albipes, Nvit: A/, vitripennis, 
Sinv: S. invicta. 

Additional file 11: Putatively lost genes in L albipes lineage. Genes 
that appear to be lost in the L. albipes lineage are included in this table. 
The functions are derived from Swiss-Prot annotation database. Amel 
gene IDs represent the gene annotation symbol in the Apis mellifera 
genome assembly. 

Additional file 12: IPR domains under-represented in L albipes 
lineage. IPR domains under-represented in the L. albipes lineage are 
included in this table. Additional columns report the number of domains 
characterized in each species. Aech: A. echinatior, Amel: A. mellifera, Cflo: 
C. floridanus, Dmel: D. melanogaster, Hsal: H. saltator, Lalb: L. albipes, Nvit: 
N. vitripennis, Sinv: 5. invicta. 

Additional file 13: Phylogenetic tree of yellow and MRJP genes. The 

MRJP genes are highlighted in light green (top), yellow genes highlighted 
in light blue (bottom). Red branches are A. mellifera orthologs, and dark 
blue branches are L. albipes. 

Additional file 14: Putative DNMT homologs in L albipes. Putative 
DNMT homologs in L. albipes were identified using a BLASTP search 
against human, chicken, Nasonia, and honey bee (A mellifera). L. albipes 
gene IDs, the target ID, and the E-values are included in this table. 

Additional file 15: Maximum likelihood tree of DNMT orthologs. 

A BLASTP query of the putative dnmt homologs of L. albipes (Lalb) to 
human (Hsap), honey bee (Amel), chicken (Ggal), Nasonia (Nvit), and 
Drosophila (Dmel) revealed four L. albipes genes that are putative DNA 
methyltransferases. A maximum-likelihood tree depicts the relationships 
among the three DNMTs and their respective orthologs in each species. 
Bootstrap values indicate level of support at each node. 

Additional file 16: Distribution of GC content in L albipes. L albipes 
exons are G+C enriched compared to the genomic background, while 
introns have lower G+C contents compared to the genome. 

Additional file 17: CpG and GpC 0/E ratios are negatively 
correlated. (A) CpG 0/E and (B) GpC 0/E are strongly negatively 
correlated with G+C contents. Consequently, CDs exhibit lower GpC 0/E 
compared to the genomic background. 

Additional file 18: CpG and GpC 0/E ratios by GC content. Genes 
and genomic fragments were divided into five groups according to their 
G+C content. Our results show that across all the groups, CpG 0/E values 
of CDS are still significantly lower than that of the genome background 
when GC content is minimized, while GpC 0/E values of CDS are highly 
similar to those of genome background. 

Additional file 19: Candidate genes for methylation. A total of 1,801 
genes have significantly lower CpG 0/E ratios than the genomic 
background but not significantly different GpC 0/E (FDR <0.2). These 
represent strong candidates for DNA methylation. GenelD names, CpG 
0/E, GpC 0/E, and FDR-corrected P values are included in this table. 

Additional file 20: Genes showing signatures of accelerated 
evolution in L albipes. Genes showing signatures of accelerated 
evolution in L albipes relative to other tested lineages. Null omega is the 
expected omega value; L. albipes alternative omega is the estimated omega 
value for the L. albipes lineage as compared to the other tested lineages. 

Additional file 21: Genes showing signatures of accelerated 
evolution in Apoidea. Genes showing signatures of accelerated 
evolution in Apoidea (bees) relative to other tested lineages. Null omega 
is the expected omega value; Apoidea alternative omega is the 
estimated omega value for the Apoidea branches as compared to the 
other tested lineages. 

Additional file 22: GO enrichment of genes undergoing accelerated 
evolution in L albipes. Results of Gene Ontology analyses for genes 
experiencing accelerated evolution in L albipes. BP: biological process, 
CC: cellular component, MF: molecular function. 

Additional file 23: IPR enrichment of genes experiencing 
accelerated evolution in L albipes. IPR enrichment analysis results with 
IPR IDs and titles for genes experiences accelerated evolution in L. albipes 
relative to other tested lineages. 



Additional file 24: KEGG pathway enrichment genes undergoing 
accelerated evolution in L albipes. KEGG analysis revealed several 
pathways associated with genes experiencing accelerated evolution in 
the L. albipes lineage. MapID and Map Title are specified according to the 
KEGG database. 

Additional file 25: Individual resequencing. Ka/Ks calculations using 
genome sequences for a solitary and social female identified six genes 
that appear to be experiencing positive selection between social forms 
(FDR <0.1). These genes, the length of the coding sequence, synonymous 
(Ks) and non-synonymous (Ka) substitutions, and their ratio (Ka/Ks) are 
summarized in this table. 
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